Skip to content

Conversation

smklein
Copy link
Collaborator

@smklein smklein commented Aug 27, 2025

Split off of #8845

Adds and tests queries which will be used in integration (reading on boot): #8925

Does not actually flip Nexus to use these records yet.

Depends on #8924

Next part of #8501: Adding queries for these records

@smklein smklein changed the title (2/N) Db metadata nexus queries (2/N) db_metadata_nexus queries Aug 27, 2025
@smklein smklein force-pushed the db_metadata_nexus_queries branch from e301683 to b3e696e Compare August 27, 2025 23:33
@smklein smklein force-pushed the db_metadata_nexus_queries branch from b3e696e to 20fafc3 Compare August 28, 2025 22:53
@smklein smklein force-pushed the db_metadata_nexus_records branch from 6e20a24 to 23a3335 Compare August 28, 2025 22:53
@smklein smklein force-pushed the db_metadata_nexus_queries branch from 20fafc3 to b884dfa Compare August 28, 2025 23:03
@smklein smklein changed the base branch from db_metadata_nexus_records to nexus_gen_schema August 28, 2025 23:07
@smklein smklein changed the title (2/N) db_metadata_nexus queries (3/N) db_metadata_nexus queries Aug 28, 2025
@smklein smklein marked this pull request as ready for review August 28, 2025 23:13
@smklein smklein force-pushed the db_metadata_nexus_queries branch from b884dfa to a91fb35 Compare August 29, 2025 01:25
@smklein smklein force-pushed the db_metadata_nexus_queries branch from a91fb35 to 0a20282 Compare August 29, 2025 15:48
@smklein smklein force-pushed the db_metadata_nexus_queries branch from 0a20282 to 8e93726 Compare August 29, 2025 21:05
@smklein smklein force-pushed the db_metadata_nexus_queries branch from 8e93726 to 2dba0e1 Compare August 29, 2025 21:22
/// Describes the state of the database access with respect this Nexus
#[derive(Debug, Copy, Clone, PartialEq)]
enum NexusAccess {
/// Nexus does not yet have access to the database.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Nexus does not yet have access to the database.
/// Nexus does not yet have access to the database, but can take over when current-generation Nexus instances quiesce

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 6034c51

/// Nexus does not yet have access to the database.
DoesNotHaveAccessYet { nexus_id: OmicronZoneUuid },

/// Nexus has been explicitly locked out of the database.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Nexus has been explicitly locked out of the database.
/// Nexus has been permanently, explicitly locked out of the database.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 6034c51

/// Start a schema update
Update,

/// Refuse to use the database
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Refuse to use the database
/// Permanently refuse to use the database

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 6034c51

Comment on lines 245 to 246
// - Systems that haven't been migrated to include nexus access control
// (we need access to the database to backfill these records).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this case isn't necessary because of the schema migration.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in an older version, this check was taken even for the schema-updater binary, but that's no longer the case with the IdentityCheckPolicy::DontCare option.

Updated in 6034c51

return Ok(NexusAccess::HasImplicitAccess);
}

// Records exist, so enforce the access control check
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Records exist, so enforce the access control check
// Records exist, so enforce the identity check

We didn't really talk about this but I've been starting to use the term "identity check" rather than "access control" to avoid confusing it with IAM/RBAC/authz sort of stuff. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me; definitely reasonable to avoid "access control" as a term because that's overloaded.

Updated in 6034c51

let msg = "Nexus does not have access to the database (no \
db_metadata_nexus record)";
warn!(&self.log, "{msg}"; "nexus_id" => ?nexus_id);
return Ok(NexusAccess::DoesNotHaveAccessYet { nexus_id });
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be a different variant because right now it's indistinguishable from the case of finding a NotYet record (L287). It looks to me like we'll wind up creating a DatastoreSetupAction::NeedsHandoff, which is not correct for this case. I believe the correct answer is to wait a bit and check the whole thing again.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 6034c51 to add a new variant here, and update tests.

This will also get propagated into #8925, where we react to the new TryLater action

Base automatically changed from nexus_gen_schema to main August 30, 2025 14:43
Copy link
Contributor

@jgallagher jgallagher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, mostly just tiny doc nits.

enum NexusAccess {
/// Nexus does not yet have access to the database, but can take over when
/// the current-generation Nexus instances quiesce.
DoesNotHaveAccessYet { nexus_id: OmicronZoneUuid },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this our own nexus_id? Why do we carry it in this variant and not the others?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a follow-up from #8845 (comment)

In short: NexusAccess::DoesNotHaveAccessYet can result in us returning DatastoreSetupAction::NeedsHandoff to a user. In this case, they must have a Nexus ID to use to initiate handoff.

There was an error case I previously handled at runtime, to the effect of "if a client calls check_schema_and_access, and gets back the NeedsHandoff action, but does not have a Nexus ID, then throw an error". But this case is impossible - we only return that result when an explicit Nexus ID is passed through IdentityCheckPolicy -> NexusAccess -> DatastoreSetupAction.

Rather than having an impossible-to-cover conditional, I just passed the Nexus UUID through the endpoint that needs it, for the purposes of feeding-forward to the output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants